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Abstract 

In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute Fk (for k > 2) 
in space complexity 0(j}oly-log(n, m) ■ n 1- ^ ), which is optimal up to (large) poly-logarithmic factors in 
n and m, where m is the length of the stream and n is the upper bound on the number of distinct elements 
in a stream. The best known lower bound for large moments is f2(log(n)n £ ). A follow-up work of 
Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic factors of Indyk and 
Woodruff to 0(log 2 (m) • (log n + log m) ■n 1 ^* ). Further reduction of poly-log factors has been an elusive 
goal since 2006, when Indyk and Woodruff method seemed to hit a natural "barrier." Using our simple 
recursive sketch, we provide a different yet simple approach to obtain a 0(log(m) log(n?7j) • (log log n) 4 • 
n 1- * ) algorithm for constant e (our bound is, in fact, somewhat stronger, where the (log log n) term can 
be replaced by any constant number of log iterations instead of just two or three, thus approaching log*n. 
Our bound also works for non-constant e (for details see the body of the paper). Further, our algorithm 
requires only 4- wise independence, in contrast to existing methods that use pseudo-random generators for 
computing large frequency moments. 



1 Introduction 



The celebrated paper of Alon, Matias and Szegedy [1J defined the following streaming model: 

Definition 1.1. Let m,n be positive integers. A stream D = D(n,m) is a sequence of size m of integers 
Pi,... ,p m , where pi € {1, . . . , n}. A frequency vector is a vector of dimensionality n with non-negative 
entries fi,i £ [n] defined as: 

fi = \{j ■ 1 <j < m,pj = 

Definition 1.2. A k-th frequency moment of D is defined by Fk(D) = Y^ieln] fi- ^ so = max «e[n] fi- 

Alon, Matias and Szegedy [ 1 ] initiated the study of approximating frequency moments with sublinear mem- 
ory. Their surprising and fundamental results imply that for k < 2 it is possible to approximate Ff. with 
polylogarithmic space; and that polynomial space is necessary for k > 2. Today, research on frequency mo- 
ments is one of the central directions for streaming; many important discoveries have been made since OQ. 
The incomplete list of relevant work includes Ell[Tlll21[iai3Ii21II31IHllEllI3^^l^|2SlllSlB10Ea. 

For small k < 2, a long line of papers culminated in the recent optimal results: 

• k = 0: In their award-winning paper, Kane, Nelson and Woodruff ll24l gave optimal-space solution. 

• < k < 2: Kane, Nelson, and Woodruff ll23l . and later Kane, Nelson, Porat and Woodruff [22], gave 
optimal-space solutions. 

• k = 2: The famous sketch of Alon, Matias and Szegedy 0]] is, in fact, optimal. 

For large k > 2, after years of tremendous effort by the theory community, with important intermediate 
results, the state of the art is as follows: 

• k > 2 [Lower bounds:] The lower bound of $7 ^n 1_ f ^ on space complexity was shown by Bar-Yossef, 
Jayram, Kumar and Sivakumar [2], and Chakrabarti, Khot and Sun [10]. Recently, the lower bound of 
$7 ^(logra) • n l ~k^ was announced by Jayram and Woodruff (see the last page of 11261 Monemizadeh 
and Woodruff SODA 2010 presentation of W\ \ 

• k > 2 [Upper bounds:] Indyk and Woodruff in their ground-breaking paper |[T9l first presented a 
two-pass algorithm with space complexity of O (^-^ ■ (log 2 n)(log 6 m) ■ n 1- *^ and then shown how 
their two-pass algorithm can be converted to one-pass algorithm with additional poly-log multiplicative 
factors. The method of Indyk and Woodruff [ 19] was subsequently improved in 2006 by Bhuvanagiri, 

Ganguly, Kesh and Saha to achieve: O ^ Jf^/ k ■ (log 2 m) ■ (logra + logm) • n 1- ^ space com- 
plexity with one pass. To the best of our knowledge, this bound is the best know until today. 

Main Technical Challenge: No progress was made on the problem of large frequency moments since the 
2006 work of @ described above due to the following "barrier": The large frequency moments represent the 
case of implicit vectors that cannot be sketched, at least directly. That is, no linear computation is known 
(unlike the case for the small sketches) that would give a good approximation for the entire vector. In fact, 
every algorithm that achieves 0(n 1_2 / fc ) memory bits boils down to the Indyk and Woodruff approach. More- 
over, this is also true for algorithms for other implicit objects EEHl- Thus, it might be necessary to not only 
improve the existing bounds, but also to come up with new methods for computing estimates of implicit 
vectors. 
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Our Results: This is exactly what we do in this paper. We give a new, recursive method of computa- 
tions of implicit vectors that also improves the upper bounds for large frequency moments. We improve the 
bound of Bhuvanagiri, Ganguly, Kesh and Saha (H from 0(/c 2 e~ 2 ~( 4//fc ) log 2 (m) log(nm)n 1_ fc) to at least 
0(fc 2 e _2_ ( 4//fc ) (log log(n)) 4 log(m) log(nm)n 1_ * ). In fact, we give an even better bound. For any constant 
t we achieve: 



space complexity, where: 
and 



f k 2 _2 
( £ 2+(4/fc) ^( re ) lo g( m ) log(nm)n ~* 

go (n) = n 
9t(n) = log(^_i(n)). 



For constant t and e, we can further improve our bound to O (log(n) log(nlog(m)) • gt(n) ■ n 1_2//fe ). (Thus, 
this is a nearly quadratic improvement of the possible ratio between upper and lower bounds compared to the 
recently announced il(log(n)n 1_2 / fc ) lower bound of Jayram and Woodruff.) 

Our reduction requires only pairwise independence in contrast to the full independence that previous 
approaches need. Eliminating the need for total randomness is an important challenge for streaming; see, 
e.g., ll23l . We obtain an algorithm that needs only 4-wise independence and thus does not need Nisan's 
pseudorandom generators ll29l . Finally, we note that our proofs are elementary, along the lines of AMS-type 
proofs. 

An Alternative Perspective of Our Results: Many fundamental problems in streaming can be seen as com- 
puting L\ approximation of implicit vectors. For instance, the frequency moment Fp. can be seen as an L\ 
norm of a vector with entries Except for small moments (i.e., k < 2), no sketching (i.e., linear transforma- 
tion) algorithms were known in the past. That is, all previous methods for computing for k > 2 resorted 
to non-linear computations, such as medians to boost the probability that heavy hitters will contribute. 

We give a recursive sketching algorithm for estimating within (1 ± e) the L\ norm of an implicit n- 
dimensional vector of non-negative values, where the algorithm is not given such a vector explicitly, but is 
only allowed access through a "heavy hitters" oracle. Unlike all previous methods, our recursive sketching 
algorithm is a linear transformation (to heavy hitters) and requires 0(log n) calls to a heavy hitters oracle and 
yields a (1 ± e) approximation to L\ with constant probability. We note that our algorithm can be viewed as a 
random linear transformation on an implicit vector to heavy hitters, and thus gives a new dimension reduction 
method. Note that our dimension reduction does not contradict the impossibility result of Brinkman and 
Charikar (H, since our dimension reduction method preserves only the norm of the implicit vector and not 
pairwise distances between vectors. Yet, our method is sufficient for multiple streaming applications where 
we typically care about the norm of a single implicit vector. Thus, we believe that our method might be 
useful beyond approximating large frequency moments. In particular, it can be applied to other functions and 
implicit objects such as matrices, e.g., in II61I2TH71. 

Informal Ideas: Let us describe, very informally, the fundamental approach of Indyk and Woodruff |[T9l . 
They split the frequency vector into "layers," where each layer contains all entries with frequencies between, 
e.g., 7* and 7* +1 for a carefully chosen 7 > 1. Then they approximate the contribution of each layer by 
sampling the stream and by finding the heavy elements that contribute to the layer. Their elegant analysis 
shows that such a procedure ensures a good approximation with high probability. 

We also use the connection between frequency moments and heavy hitters discovered by Indyk and 
Woodruff. However, we do not use the layers method; we employ recursion instead. For streaming appli- 
cations, recursion can be helpful if it is possible to reduce computations to a single instance of a smaller 
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problem. This is the approach that we take. More specifically, we show that, given an algorithm for "heavy 
hitters," it is possible to reduce such a problem on a vector of size n to a single computation of a random 
vector of size approximately \n. 

This simple observation follows from elementary arguments such as Chebychev or Hoeffding inequality. 
We then employ this observation recursively and show that log(n) recursive calls can give an algorithm that 
already matches the bounds from Q. Further, it is possible to reduce the number of recursive calls log(n) to 
log log(n) by applying the same argument, but stopping after 0(log log(n)) steps. At the depth 0(log log(n)) 
of the recursion, the number of positive frequencies in a corresponding vector is polylogarithmically smaller 
then n, with constant probability. Thus, any algorithm that works in polylog(n, m)n 1_2 / fc space will approx- 
imate such a vector "for free." Employing such an algorithm at the bottom of loglog(n) recursion reduces 
the log(ra) factor to a poly (log log(ra)) factor. Further, the same idea may be repeated at least constant num- 
ber of times; this is how we achieve our final bound. That is, we show that approximating the L\ norm of 
implicit vectors is practically equivalent to finding heavy hitters. Our method is quite general and works for 
any implicit vector. Further, the simplest variant of the argument requires only pairwise independence, giving 
an algorithm that requires only 4-wise independence, in contrast to existing methods that use pseudorandom 
generators. 

We gave a simple analysis that uses Chebyshev inequality. Better bounds are possible. For instance, as- 
suming total randomness of H we can apply tail bounds such as the Hoeffding bound or Bernstein inequality. 
For our purposes, even Chebyshev-like bounds are sufficient, thus we present only these bounds here. Also, 
pairwise independence allows us to simplify algorithms by avoiding pseudorandom generators. 

1.1 Roadmap 

In Section |2] we introduce the basic argument and extend it to a special case, stuitable for streaming applica- 
tions, case in Section [3] In Section [4] we describe a generic algorithm for recursive computations. In Section 
[5] we use our method to obtain a better upper bound for the problem of frequency moments. 

2 Recursive Sketches 

In this paper we denote by \V\ the L\ norm of V, i.e., \ V\ = J2je[n] v j- 
Definition 2.1. Major elements 

Let V be a vector of dimensionality n with non-negative entries vi > 0. Let < a < 1. An element Vi is a 
a-major with respect to V if: Vi > a\V \ . A set S C [n] is a a-core w.r.t. V ifi € S for any a-major v^. 

Lemma 2.2. Let V € be a fixed vector and let S be an a-core w.r.t. V. Let H be a random vector with 
uniform zero-one entries hi,i € [n] that are pairwise-independent. Define 

x = y^vj +2y~]hiVi. 

Then P(\X - \V\\ > e\V\) < 

Proof. Clearly, E(X) = \V\. By the properties of variance, by pairwise independence of hi and by the 
definition of a-core: 

Thus, by Chebyshev inequality: 

P(\X-\V\\ > e\V\) < J. 

□ 
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Corollary 2.3. Let V € Ri n * be a random vector and let S be an a-core w.r.t. V. Let H be a random vector 
independent of V and S with uniform zero-one entries hi,i € [n] that are pairwise-independent. Define 

X = y^fj + 2 s ^h i v i . 

Then 

P{\X-\V\\ > e\V\) < %. 

Proof. For any fixed V and S the main claim is true since H is independent of V and S and by Lemma [231 
Thus, the corollary follows. □ 

2.0.1 Recursive Computations 

Let 4> be a parameter. Let Hi,... ,H^ be i.i.d. random vectors with zero-one entries that are uniformly 
distributed and pairwise independent. For two vectors of dimensionality n define Had(V, U) to be their 
Hadamard product; i.e., Had(V, U) is a vector of dimensionality n with entries ViUi. Define: 

V = V, and Vj = Had{Vj-i,Hj) for j = 1, . . . , 0. 

Denote by vj and h\ the i-th entry of Vj and if, respectfully. Let So , • • • , S$ be a sequence of subsets of [n] 
such that Sj is an a-core of Vj. Define the sequence 

x j = ^vi+2j2K +1 vi, i = o,...,0-i, 



and X^ = \V<p 
Fact 2.4. 



p(Uw-rai>^i))< w+1)a 



i=o 



c 2 



Proof. Consider fixed j < k. It follows from the definitions that Hj + \ is independent of Vj and Sj. Applying 
Corollary 12.3 1 and the union bound we obtain the proof. □ 

Consider the following recursive definition: 

Y = ^, ^ = 2^ +1 + ]T(l-2^ +1 K. 

2 

Lemma 2.5. For any (ft, 7, vector V and a = 

P(\Y - \V\\ > j\V\) < 0.2. 
Proof. Denote Err 1 - = \Vj\ — Xj and Err 2 - = \Vj\ —Yj. We can rewrite 

x i = 2|y i+1 | + ^(i-2^ +1 K'. 

Thus Xj - Yj = 2(\V j+ i\ - Y j+1 ) = 2Err] +1 and 

\Errj\ = \Yj - \Vj\\ < \Xj - \Vj\\ + \Xj -Yj\ = \Err}\ + 2\Ertj +1 \. 
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By definition Err] = Err 2 , = 0. Thus we can rewrite: 



\Errl\ < \Err\\ + 2\Err\\ < ■■■ <^2 j \Err)\ 

j=0 



Choose e = 10 (J +1 ) ; we have by Fact 

P(\Y - \V\\ > j\V\) = P{\Errl\ > j\V\) < P(^2^\Err)\ > j\V\) < 

3=0 

P I ( J»rrj| > j\V\j n f p {\Err)\ < e\Vj\) j j + P ({J {\X 3 - \V 3 \\ > e\Vj\) } < 



1)q 



e 2 



P ^Yl^lVjl > 1O(0 + 1)|F|1 + 
For j > we note that \ Vj \ is a random variable defined as: 

i6[n] \*=1 / 
Since all Hj are mutually independent, we conclude that 

£(X>i^i)=x> (e«* (nW) ) =x> (e^) =(^+ i )i^i- 

J=0 i=0 \ie[n] \t=l / / i=0 \ie[n] / 

Thus, and by Markov inequality, we have 

P(J22 j \Vj\ > 10(0 + 1)|V|) < 0.1. 

3=0 

Also, ^^p a < 0.1 for sufficiently large a = Thus, 

P(|*b - l^ll > 7|V|) < 0.2. 

□ 

3 An Extension: Approximate and Random Cores 

There are many ways to extend our basic result. We will explore one direction, when the cores are random 
and contain approximations of heavy hitters with high probabilitjQ. We consider vectors from a finite domain 

\m] n . 



applications; on the other hand it simplifies the presentation. 
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Definition 3.1. Let 0, be a finite set of real numbers. Define Pairst to be a set of all sets of pairs of the form: 
{(«l, wi), . . . , (i t ,w t )}, 1 < i\ < i 2 < ■ ■ -it < n,ij G N,Wj G ft. 

Further define 

Pairs = U I [^J Pairst J . 

Definition 3.2. A non-empty set Q G Pairst, i.e., Q = {(ix,wi), . . . , (it, wt)} for some t G [n], is (a, e)- 
cover w.r.t. vector V G [M] n if the following is true: 

1. Vj G M(l-eK- <wj < {l + e)v h . 

2. Vi G [n] ifvi is a-major then 3j G [t] such that ij = i. 

Definition 3.3. Let D be a probability distribution on Pairs. Let V G [m] n be a fixed vector. We say that T> 
is 5-good w.r.t. V if for a random element Q of Pairs with distribution D the following is true: 

P{Q is (a, e)-cover ofV)>l-5. 

Definition 3.4. Let g be a mapping from [M] n to a set of all distributions on Pairs. We say that g is 5-good 
if for any fixed V G [M] n the distribution g(V) is 5-good w.r.t. V. Intuitively, g represents an output of an 
algorithm that finds heavy hitters (and their approximations) of input vector V w.p. 1 — 5. 

Definition 3.5. For non-empty Q G Pairs define Ind(Q) to be the set of indexes of Q. Formally, for 
Q G Pairs, denote Ind(Q) = {% : 3j < t such that for j-th pair (ij,Wj) of Q it is true that ij = i}. 
Fori G Ind(Q) denote by WQ(i) the corresponding approximation, i.e. if i = ij then wq(i) = Wj. (Note 
that since ij < ij+\ this is a valid definition.) For completeness, denote wq(i) = for i £ Ind(Q) and 
Ind($) = 0. 

Now we are ready to repeat the arguments from the previous section. 

Corollary 3.6. Let V G be a random vector. Let g be a 5-good mapping and let Q be a random element 
of Pairs that is chosen according to a distribution g(V). Let H be a random vector independent ofV and Q 
with uniform zero-one entries hi,i G [n] that are pairwise-independent. Define 

X' = v i + 2 Yl hiVi ' 

i£lnd(Q) i(£Ind(Q) 

Then 

P(\X'-\V\\ > e\V\) <Aj + 5. 

or 

Proof. Consider a fixed vector Vq and an event that V = Vq. Conditioned on this event, the distribution 
g(V) is fixed and 5-good w.r.t. Vq. Consider the event that Q = Qq, where Qq is an (a, e)-cover w.r.t. Vq. 
Conditioned on this event, Ind(Q) is an a-cover w.r.t. Vq. Since H is independent of Q the claim is true for 
any such Vq by Lemma 1X21 and by union bound. Thus, the corollary follows. □ 
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3.0.2 Recursive Computations 

Let (j) be a parameter. Let Hi,... ,H\$ be i.i.d. random vectors with zero-one entries that are uniformly 
distributed and pairwise independent. Define: 

V = V, and Vj = Had{Vj^x,Hj) for j = 1, . . . , <j>. 

Denote by vj and hf the i-th entry of Vj and Hj respectfully. Let g be a <5-good mapping and let Qi be a 
random element of Pairs with distribution g(Vi). Define Wj(i) = u>Q j (i). Define the sequence: 

*J= E E J = 0,...,*-1, 

i£lnd(Qj) i£Ind{Qj) 

and X'^ = |V^|. From Corollary 13 .61 and by repeating the arguments from Fact 12 .4 1 we obtain 
Fact 3.7. 

P(\J {IX'j-m] >e\Vi\))<U> + 1)£ + S). 

3=0 

Consider the following recursive definition. Let Y£ = Y^(V^) be a random variable that depends on random 
vector Vtj, and such that for any fixed Vf. 

P(|^-|^||>e|^|)<<5. 

Also, define for j = 0, . . . , — 1: 

^ = 2 ^i+ E U-^K- 

i£lnd(Qj) 

Lemma 3.8. For any (j), 7, vector V; for a = an ^ ^ = ^(^) ; 

p(\y{-\v\\>j\v\)<o.2. 

Proof. Denote Err 1 - = \Vj\ — Xj, Err 2 - = \Vj\ —Y- and Err? = Y2ieind(Q 3 ) \ w j(^) ~ v l I- We can rewrite 

X> = 2\V j+1 \ + Y, (l- 2 ^' +1 K- 
ieInd(Qj) 

Thus \Xj -Y-\< 2\Errj +1 \ + \Err?\ and 

|^rr?| = |Y/ - \Vj\\ < \X] - \Vj\\ + |Xj - y/| < \Err)\ + |£rrf | + 2\Err] +1 \. 
Thus we can rewrite: 

4> <t> 

\Errl\ < \Err] \ + \Errl\ + 2\Err\\ <■■■< 2 k Err\ + ^ 2 j \Err) \ + ^ 2 j \Err]\. 

j=0 i=0 

Choose e = 30(< J +1) and denote Z = 2 k Err\ + ^| =0 2^\Err)\ + £? =0 2 J '|£rrj|. Then 
P(W ~ \V\\ > l\V\) = P(\Errl\ > 7 \V\) < P(Z > j\V\) < 
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4> \ / * 

P[(Z> j\V\) n [ fl {\Ett}\ < e\Vi\) n f| (|JS?rrJ| < e\V 3 \) ] n (|SrrJ| < e|^|) ] + 
Vi=o / \j=o 

P[\Errl\ > e|V^|) + P ( (J (|^rrj| > e\Vj\) \ + P\{J [\Err]\ > e|^|) 

y=o / y=° 

Note that by the definition of Y±, we have P(\Err^\ > e\ V^\) < 5. Also, by the definition of Qj and union 
bound, 

P{\j{\Er^\>e\Vj\)) < (0 + 1)5. 

j=0 

Thus and by Fact 13771 



P(\Yi -\V\\> 7 \V\) <P\J2 2 "l^l ^ W ^ + I + & + 2 )(^ + 26 )- 



The lemma follows by repeating the concluding arguments from Lemma [2751 □ 

4 A Generic Algorithm 

Let D be a stream as in Definition 11. lj For a function : [n] i-> {0, 1}, define Dh to be a sub-stream 
of D that contains only elements p £ D such that H(p) = 1. Let V = be an implicit vector of 

dimensionality n defined by a stream, e.g., a frequency moment vector from Definition 11. lj We say that a 
vector V is separable if for any H, we have Had(V(D), H) = V{Dh). Let HH(D, a, e, <5) be an algorithm 
that produces (a,e)-cover w.r.t. V(D) w.p. 1 — 5, i.e., produces 5-good distribution w.r.t. V(D) for some 
suitable finite set of Pairs, as defined in Definition 13. II 



Algorithm 4.1. Recursive Sum[0](D, e) 

1. Generate <f> = 0(log(n)) pairwise independent zero-one vectors Hi, . . . ,H^. Denote Dj to be a 
stream D Hl H 2 ,..H r 

2. Compute, in parallel, random cores Qj = HH(Dj, e, 4) 

3. IfFoiYtp) > 10 10 then output and stop. Otherwise compute precisely = \V^,\. 

4. For eachj = <p - 1,... ,0, compute Yj = 2Y j+1 - Y, teInd{Q]) (l - 2h D w Q 3 ( i )- 

5. Output Yq. 



Theorem 4.2. Algorithm \4. 1 \ computes (1 ± e) -approximation of\V\ and errs w.p. at most 0.3. The algorithm 
uses 0(log(n)fi(n, e 2 log 3 ( n ) > e i iog(n) )) memor y ^ ts > where fl is the space required by the above algorithm 
HH. 



Proof. The correctness follows directly from the description of the algorithm and Lemma 13.81 and Markov 
inequality. The memory bounds follows from the direct computations. □ 
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5 Approximating Large Frequency Moments on Streams 

We apply the developed above technique to the problem of frequency moments. 

Fact 5.1. Let V be a vector of dimensionality n with non-negative entries and let Uq be a number of non-zero 
entries in V. Let < a < 1 and let V{ be such that vf > a Ylje[ n ] v j- Then vf > 0.5a* rig Ylj^i v j- 



Proof. If Hq = the fact is trivial. Otherwise, by Holder's inequality, Ylj^i v ] — n o k v j^j * — 

The famous Count-Sketch [11] algorithm finds all a-heavy elements. In particular, the following is a 
corollary from ifTTTl . 

Theorem 5.2. (from HI IV ) Let at be the frequency of the t-th most frequent element. There exists an algorithm 

that w.p. 1 — 5 outputs t pairs (i, f-) such that (1 — e)/j < /• < (1 + e)fi and such that all elements with 

v. f 2 

fi > (1 — e)a,t appear in the list. The algorithm uses 0((t -\ ' e[ ^ Q ^ at ' ) log(m/5) log(m)) memory bits. 

Combining with Fact 15. II we obtain 

Corollary 5.3. There exists an algorithm that w.p. 1 — 5 outputs 0(a^ 1 ) pairs [i, /• ) such that (1 — e)ff < 
f'f < (l + e)/* and such that all elements with ff > a 2~2je[n] fj a PP ear the list. The algorithm uses 
0((a _1 + ^-a" 2 ^?! 1-2 ^) log(m/5) log(m)) memory bits. 

The algorithm from Corollary 15.31 defines a 5-good distribution w.r.t. to the input vector V(D) over 
some finite se0 from Definition 13.11 Denote the algorithm from Corollary 15.31 by CS(D, a, e, 5). Thus, 
combining with Algorithm 14. II if gives an algorithm errs w.p. 5, outputs (1 ± e) -approximation of and 
uses 0( e 2+4/fe n o 2 ^ log(mn) log(m) log 1+6//fe (n) log(l/5)) memory bits, nearly matching the bound in Q. 
Denote this algorithm by Aq(D, e, 5). We can improve the bound further recursively: 
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Algorithm 5.4. Recursive Fk[ 1 ](D, e) 




1. Generate <fi = 0(loglog(ra)) pairwise independent 


zero-one vectors Hi, ... , H^. Denote Dj to be a 


stream D HlH2 ,„ Htt> . 




2. Compute, in parallel, Qj = CS(Dj, 




3. Compute = Ao(D<p, e, 0.1). 




4. For each j = (f) — 1, . . . , 0, compute Yj = 2Yj + \ — 




5. Output Yq. 





There exists a constant c such that for 4> = cloglog(n), except with a small constant probability, 
FqID^) < log i0( n ) ■ Thus, executing Ao for n' = log w^ we obtain an approximation of Fk(D ( p) using 

Q( e 2+4/fc n 1-2 / fc log(mn) log(m)) memory bits. Since cf> = 0(loglog(n)), the complexity of the new algo- 
rithm becomes 0( ^ 2 4/k n 1 ~ 2 ^ k log(mn) log(m)(loglog(n)) 4 ). Repeating this argument a constant number 
of times we arrive at: 



2 Indeed, we can define the finite set Q from Definition l3.1l as a set of all possible outputs of Count-Sketch executed over all vectors 
on [m] n . This is a finite set (for finite n, m) and thus we can define Pairs accordingly. 
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Theorem 5.5. Define gi(n) = log(n) and gt(n) = log(<?t_i(n)). For any constant t there ex- 
ist an algorithm computes a (1 ± e) -approximation of Fk(D), errs w.p. at most | and uses 
0(ctfe 2 e _2_( ' 4 / fc - > n 1_ k gt(n) log(m) log(nm)) memory bits, where Ct is a constant that depends on t. 

We note also that it is possible to reduce the complexity to 0(n 1_ fc^i(n) log(n)(log(n) + loglog(m))), 
at least for constant e, using, instead of CountSketch, the variant of the AMS sketch and the ideas from 1171 . 
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